Ordinary Web Pages as a Source for Metadata Acquisition for Open Corpus User Modeling

نویسندگان

  • Michal Barla
  • Mária Bieliková
چکیده

Personalization and adaptivity of the Web as we know of today is often “closed” within a particular web-based system. As a result there are only a few “personalized islands” within the whole Web. Spreading the personalization to the whole Web either via an enhanced proxy server or using an agent residing on a client-side brings a challenge how to determine metadata within an open corpus Web domain, which would allow for an efficient creation of overlayed user model. In this paper we present our approach to metadata acquisition for open corpus user modeling applicable on the “wild” Web, where we decided to take into account metadata in the form of keywords representing the visited web pages. We present the user modeling process (which is thus keyword-based) built on the top of an enhanced proxy server, capable of personalizing user browsing sessions via pluggable modules. The paper focuses on comparison of algorithms and thirdparty services which allow for extraction of required keywords from ordinary web pages, which is a crucial step of our user modeling approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

انطباق عناصر فرادادۀ وب‏سایت کتابخانه‏های مرکزی دانشگاه‏های علوم پزشکی با عناصر فرادادۀ هسته دوبلین

Introduction: Considering the importance of library websites in the establishment of communication and provision of services for their users, it is crucial to include those features in these websites which can lead to increased dynamism and optimal communication. The present study aimed at comparing Metadata elements of Dublin Core with those of the websites of Central Libraries of Medical Univ...

متن کامل

تشخیص ناهنجاری روی وب از طریق ایجاد پروفایل کاربرد دسترسی

Due to increasing in cyber-attacks, the need for web servers attack detection technique has drawn attentions today. Unfortunately, many available security solutions are inefficient in identifying web-based attacks. The main aim of this study is to detect abnormal web navigations based on web usage profiles. In this paper, comparing scrolling behavior of a normal user with an attacker, and simu...

متن کامل

RIDIRE-CPI: an Open Source Crawling and Processing Infrastructure for Web Corpora Building

This paper introduces the RIDIRE-CPI, an open source tool for the building of web corpora with a specific design through a targeted crawling strategy. The tool has been developed within the RIDIRE Project, which aims at creating a 2 billion word balanced web corpus for Italian. RIDIRE-CPI architecture integrates existing open source tools as well as modules developed specifically within the RID...

متن کامل

Grass-roots Semantic Web Tools

One of the biggest challenges of the Semantic Web is to make its tools usable by ordinary users for grass-roots production and integration of semantic information. This paper introduces the ongoing research on this issue in our research group at the Information Sciences Institute. 1. RESEARCH OVERVIEW Despite years of intense work and research on the Semantic Web, it has not become a reality. O...

متن کامل

Use of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems

  One of the most famous methods for recommendation is user-based Collaborative Filtering (CF). This system compares active user’s items rating with historical rating records of other users to find similar users and recommending items which seems interesting to these similar users and have not been rated by the active user. As a way of computing recommendations, the ultimate goal of the user-ba...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010